Search for: All records

Creators/Authors contains: "Surdeanu, Mihai"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition

Riaz, Haris; Dumitru, Razvan Gabriel; Surdeanu, Mihai (May 2024, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024))
Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.)
In this work, we revisit the problem of semi-supervised named entity recognition (NER) focusing on extremely light supervision, consisting of a lexicon containing only 10 examples per class. We introduce ELLEN, a simple, fully modular, neuro-symbolic method that blends fine-tuned language models with linguistic rules. These rules include insights such as ''One Sense Per Discourse'', using a Masked Language Model as an unsupervised NER, leveraging part-of-speech tags to identify and eliminate unlabeled entities as false negatives, and other intuitions about classifier confidence scores in local and global context. ELLEN achieves very strong performance on the CoNLL-2003 dataset when using the minimal supervision from the lexicon above. It also outperforms most existing (and considerably more complex) semi-supervised NER methods under the same supervision settings commonly used in the literature (i.e., 5% of the training data). Further, we evaluate our CoNLL-2003 model in a zero-shot scenario on WNUT-17 where we find that it outperforms GPT-3.5 and achieves comparable performance to GPT-4. In a zero-shot setting, ELLEN also achieves over 75% of the performance of a strong, fully supervised model trained on gold data. Our code is available at: https://github.com/hriaz17/ELLEN
more » « less
Full Text Available
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification

https://doi.org/10.18653/v1/2024.findings-naacl.165

Vacareanu, Robert; Alam, Fahmida; Islam, Md Asiful; Riaz, Haris; Surdeanu, Mihai (June 2024, Association for Computational Linguistics)

This paper introduces a novel neuro-symbolic architecture for relation classification (RC) that combines rule-based methods with contemporary deep learning techniques. This approach capitalizes on the strengths of both paradigms: the adaptability of rule-based systems and the generalization power of neural networks. Our architecture consists of two components: a declarative rule-based model for transparent classification and a neural component to enhance rule generalizability through semantic text matching. Notably, our semantic matcher is trained in an unsupervised domain-agnostic way, solely with synthetic data. Further, these components are loosely coupled, allowing for rule modifications without retraining the semantic matcher. In our evaluation, we focused on two few-shot relation classification datasets: Few-Shot TACRED and a Few-Shot version of NYT29. We show that our proposed method outperforms previous state-of-the-art models in three out of four settings, despite not seeing any human-annotated training data. Further, we show that our approach remains modular and pliable, i.e., the corresponding rules can be locally modified to improve the overall model. Human interventions to the rules for the TACRED relation org:parents boost the performance on that relation by as much as 26% relative improvement, without negatively impacting the other relations, and without retraining the semantic matching component.
more » « less
Full Text Available
Deep Learning for Natural Language Processing: A Gentle Introduction

Surdeanu, Mihai; Valenzuela-Escárcega, Marco (February 2024, Cambridge University Press)

Upon encountering this publication, one might ask the obvious question, "Why do we need another deep learning and natural language processing book?" Several excellent ones have been published, covering both theoretical and practical aspects of deep learning and its application to language processing. However, from our experience teaching courses on natural language processing, we argue that, despite their excellent quality, most of these books do not target their most likely readers. The intended reader of this book is one who is skilled in a domain other than machine learning and natural language processing and whose work relies, at least partially, on the automated analysis of large amounts of data, especially textual data. Such experts may include social scientists, political scientists, biomedical scientists, and even computer scientists and computational linguists with limited exposure to machine learning. Existing deep learning and natural language processing books generally fall into two camps. The first camp focuses on the theoretical foundations of deep learning. This is certainly useful to the aforementioned readers, as one should understand the theoretical aspects of a tool before using it. However, these books tend to assume the typical background of a machine learning researcher and, as a consequence, I have often seen students who do not have this background rapidly get lost in such material. To mitigate this issue, the second type of book that exists today focuses on the machine learning practitioner; that is, on how to use deep learning software, with minimal attention paid to the theoretical aspects. We argue that focusing on practical aspects is similarly necessary but not sufficient. Considering that deep learning frameworks and libraries have gotten fairly complex, the chance of misusing them due to theoretical misunderstandings is high. We have commonly seen this problem in our courses, too. This book, therefore, aims to bridge the theoretical and practical aspects of deep learning for natural language processing. We cover the necessary theoretical background and assume minimal machine learning background from the reader. Our aim is that anyone who took introductory linear algebra and calculus courses will be able to follow the theoretical material. To address practical aspects, this book includes pseudo code for the simpler algorithms discussed and actual Python code for the more complicated architectures. The code should be understandable by anyone who has taken a Python programming course. After reading this book, we expect that the reader will have the necessary foundation to immediately begin building real-world, practical natural language processing systems, and to expand their knowledge by reading research publications on these topics. https://doi.org/10.1017/9781009026222
more » « less
Full Text Available
Active Learning Design Choices for NER with Transformers

Vacareanu, Robert; Noriega-Atala, Enrique; Hahn-Powell, Gus; Valenzuela-Escarcega, Marco A; Surdeanu, Mihai (May 2024, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024))
Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.)
We explore multiple important choices that have not been analyzed in conjunction regarding active learning for token classification using transformer networks. These choices are: (i) how to select what to annotate, (ii) decide whether to annotate entire sentences or smaller sentence fragments, (iii) how to train with incomplete annotations at token-level, and (iv) how to select the initial seed dataset. We explore whether annotating at sub-sentence level can translate to an improved downstream performance by considering two different sub-sentence annotation strategies: (i) entity-level, and (ii) token-level. These approaches result in some sentences being only partially annotated. To address this issue, we introduce and evaluate multiple strategies to deal with partially-annotated sentences during the training process. We show that annotating at the sub-sentence level achieves comparable or better performance than sentence-level annotations with a smaller number of annotated tokens. We then explore the extent to which the performance gap remains once accounting for the annotation time and found that both annotation schemes perform similarly.
more » « less
Full Text Available
Bootstrapping Neural Relation and Explanation Classifiers

Zheng, Tang; Surdeanu, Mihai (July 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL))

We introduce a method that self trains (or bootstraps) neural relation and explanation classifiers. Our work expands the supervised approach of (Tang and Surdeanu, 2022), which jointly trains a relation classifier with an explanation classifier that identifies context words important for the relation at hand, to semi- supervised scenarios. In particular, our approach iteratively converts the explainable mod- els’ outputs to rules and applies them to unlabeled text to produce new annotations. Our evaluation on the TACRED dataset shows that our method outperforms the rule-based model we started from by 15 F1 points, outperforms traditional self-training that relies just on the relation classifier by 5 F1 points, and performs comparatively with the prompt-based approach of Sainz et al. (2021) (without requiring an additional natural language inference component).
more » « less
Full Text Available
Improving Zero-shot Relation Classification via Automatically-acquired Entailment Templates

Rahimi Mahdi and Surdeanu Mihai (July 2023, Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP))

While fully supervised relation classification (RC) models perform well on large-scale datasets, their performance drops drastically in low-resource settings. As generating annotated examples are expensive, recent zero-shot methods have been proposed that reformulate RC into other NLP tasks for which supervision exists such as textual entailment. However, these methods rely on templates that are manually created which is costly and requires domain expertise. In this paper, we present a novel strategy for template generation for relation classification, which is based on adapting Harris’ distributional similarity principle to templates encoded using contextualized representations. Further, we perform empirical evaluation of different strategies for combining the automatically acquired templates with manual templates. The experimental results on TACRED show that our approach not only performs better than the zero-shot RC methods that only use manual templates, but also that it achieves state-of-the-art performance for zero-shot TACRED at 64.3 F1 score.
more » « less
Full Text Available
Authorbots

Bambauer, Derek E.; Surdeanu, Mihai (May 2023, Journal of Free Speech Law)
Volokh, Eugene (Ed.)
ChatGPT has exploded into the popular consciousness in recent months, and the hype and concerns about the program have only grown louder with the release of GPT-4, a more powerful version of the software. Its deployment, including with applications such as Microsoft Office, has raised questions about whether the developers or distributors of code that includes ChatGPT, or similar generative pre-trained transformers, could face liability for tort claims such as defamation or false light. One important potential barrier to these claims is the immunity con-ferred by 47 U.S.C. § 230, popularly known as “Section 230.” In this Essay, we make two claims. First, Section 230 is likely to protect the creators, distributors, and hosts of online services that include ChatGPT in many cases. Users of those services, though, may be at greater legal risk than is commonly believed. Second, ChatGPT and its ilk make the analysis of the Section 230 safe harbor more com-plex, both substantively and procedurally. This is likely a negative consequence for the software’s developers and hosts, since complexity in law tends to generate uncertainty, which in turn creates cost. Nonetheless, we contend that Section 230 has more of a role to play in legal questions about ChatGPT than most commentators do—including the principal legislative drafters of Section 230—and that this result is generally a desirable one.
more » « less
Full Text Available
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning (PAN-DL 2023)

Surdeanu, Mihai; Riloff, Ellen; Chiticariu, Laura; Frietag, Dayne; Hahn-Powell, Gus; Morrison, Clayton T; Noriega-Atala, Enrique; Sharp, Rebecca; Valenzuela-Escárcega, Marco (December 2023, Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning)

Message from the Organizers Welcome to the second edition of the Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning (Pan-DL)! Our workshop is being organized in a hybrid format on December 6, 2023, in conjunction with the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). In the past year, the natural language processing (NLP) field (and the world at large!) has been hit by the large language model (LLM) "tsunami." This happened for the right reasons: LLMs perform extremely well in a multitude of NLP tasks, often with minimal training and, perhaps for the first time, have made NLP technology extremely approachable to non-expert users. However, LLMs are not perfect: they are not really explainable, they are not pliable, i.e., they cannot be easily modified to correct any errors observed, and they are not efficient due to the overhead of decoding. In contrast, rule-based methods are more transparent to subject matter experts; they are amenable to having a human in the loop through intervention, manipulation and incorporation of domain knowledge; and further the resulting systems tend to be lightweight and fast. This workshop focuses on all aspects of rule-based approaches, including their application, representation, and interpretability, as well as their strengths and weaknesses relative to state-of-the-art machine learning approaches. Considering the large number of potential directions in this neuro-symbolic space, we emphasized inclusivity in our workshop. We received 19 submissions and accepted 10 for oral presentation. This resulted in an overall acceptance rate of 52%. Our workshop also includes 6 presentations of papers that were accepted in Findings of EMNLP. In addition to the oral presentations of the accepted papers, our workshop includes a keynote talk by Yunyao Li, who has made many important contributions to the field of symbolic approaches for natural language processing. Further, the workshop contains a panel that will discuss the merits and limitations of rules in the new LLM era. The panelists will be academics with expertise in both neural- and rulebased methods, industry experts that employ these methods for commercial products, and subject matter experts that have used rule-based methods for domain-specific applications. We thank Yunyao Li and the panelists for their important contribution to our workshop! Finally, we are thankful to the members of the program committee for their insightful reviews! We are confident that all submissions have benefited from their expert feedback. Their contribution was a key factor for accepting a diverse and high-quality list of papers, which we hope will make the first edition of the Pan-DL workshop a success, and will motivate many future editions. Pan-DL 2023 Organizers December 6, 2023
more » « less
Full Text Available
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers

https://doi.org/10.1162/coli_a_00463

Tang, Zheng; Surdeanu, Mihai (January 2023, Computational Linguistics)

Abstract We propose an explainable approach for relation extraction that mitigates the tension between generalization and explainability by jointly training for the two goals. Our approach uses a multi-task learning architecture, which jointly trains a classifier for relation extraction, and a sequence model that labels words in the context of the relations that explain the decisions of the relation classifier. We also convert the model outputs to rules to bring global explanations to this approach. This sequence model is trained using a hybrid strategy: supervised, when supervision from pre-existing patterns is available, and semi-supervised otherwise. In the latter situation, we treat the sequence model’s labels as latent variables, and learn the best assignment that maximizes the performance of the relation classifier. We evaluate the proposed approach on the two datasets and show that the sequence model provides labels that serve as accurate explanations for the relation classifier’s decisions, and, importantly, that the joint training generally improves the performance of the relation classifier. We also evaluate the performance of the generated rules and show that the new rules are a great add-on to the manual rules and bring the rule-based system much closer to the neural models.
more » « less
Full Text Available
NEUROSTRUCTURAL DECODING: Neural Text Generation with Structural Constraints

https://doi.org/10.18653/v1/2023.acl-long.528

Bastan, Mohaddeseh; Surdeanu, Mihai; Balasubramanian, Niranjan (January 2023, 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available

« Prev Next »